Introduce a cache for Publishers that tracks subscriptions to manage the cache #2861

mgodave · 2024-03-05T20:59:44Z

Motivation:

Handling the caching of Publishers comes up often and correctly managing the cache can be tricky and error prone to implement correctly. Scenarios where caching of Publishers can be useful include those similar to the multicast and replay operators but have the added dimension of asynchronous access, for instance multiple requests which need to consume the same data.

Modifications:

Add a PublisherCache utility that manages the lifecycle of a cached Publisher. A publisher is removed from the cache when it no longer has any subscriptions.

servicetalk-concurrent-api/src/main/java/io/servicetalk/concurrent/api/PublisherCache.java

bryce-anderson · 2024-03-05T21:27:17Z

servicetalk-concurrent-api/src/main/java/io/servicetalk/concurrent/api/PublisherCache.java

+        });
+    }
+
+    private static final class Holder<T> {


I think this is here so that we can add it to the hashmap and avoid some jumping through hoops to manage the reference equality stuff for HashMap. A quick code comment might be helpful for future readers.

bryce-anderson · 2024-03-05T21:41:52Z

servicetalk-concurrent-api/src/main/java/io/servicetalk/concurrent/api/PublisherCache.java

+                        });
+
+                item2.publisher = multicastStrategy.apply(newPublisher)
+                        .liftSync(subscriber -> new Subscriber<T>() {


Do we need this second liftSync? I think it applies to the individual streams. In my minds eye that means if a single stream ends it removes the underlying stream from the cache. I don't know how that would happen unless the parent publisher completed and we remove it at that level as well.

It's there to handle subscriber errors and clean up if the subscriber is put into a bad state via a throw. In fact it looks like I missed a case with onNext.

Is there a test case that would demonstrate their purpose? If I delete the second liftsync and move the syncrhronized to the first everything still works as expected.

I will add one.

@mgodave - can you add a comment along the lines of:

Motivation for this being "after" the multicast is bcz multicast doesn't propagate cancellation upstream unless there are no subscribers (e.g. they all cancel) ... so we acquire the lock in cancel here, there are no async boundaries in multi-cast, and then we remove from the map in cancel "above" multicast. This prevents race conditions where someone does a get and we return a Publisher that has been cancelled (because there are no subscriber).

Also consider breaking this out into a named (e.g. not anonymous, private/final) class which is easier to look at when debugging larger operator chains.

Also consider breaking this out into a named (e.g. not anonymous, private/final) class which is easier to look at when debugging larger operator chains.

These are inline because they access four different pieces of local state and properties. I played around with breaking this out and it doesn't feel cohesive. I'm choosing to leave it as is for now but I am happy to revisit if you have strong feelings.

Scottmitch · 2024-03-09T01:57:11Z

servicetalk-concurrent-api/src/main/java/io/servicetalk/concurrent/api/PublisherCache.java

+                        });
+
+                item2.publisher = multicastStrategy.apply(newPublisher)
+                        .liftSync(subscriber -> new Subscriber<T>() {


@mgodave - can you add a comment along the lines of:

Motivation for this being "after" the multicast is bcz multicast doesn't propagate cancellation upstream unless there are no subscribers (e.g. they all cancel) ... so we acquire the lock in cancel here, there are no async boundaries in multi-cast, and then we remove from the map in cancel "above" multicast. This prevents race conditions where someone does a get and we return a Publisher that has been cancelled (because there are no subscriber).

Also consider breaking this out into a named (e.g. not anonymous, private/final) class which is easier to look at when debugging larger operator chains.

Scottmitch · 2024-03-09T01:59:22Z

servicetalk-concurrent-api/src/main/java/io/servicetalk/concurrent/api/PublisherCache.java

+                                    @Override
+                                    public void cancel() {
+                                        try {
+                                            assert Thread.holdsLock(publisherCache);


consider adding a comment here too (lock was acquired after the multi-cast, we need to be holding the lock here to interact with the map and prevent returning a cancelled Publisher)

Scottmitch · 2024-03-09T02:00:19Z

servicetalk-concurrent-api/src/main/java/io/servicetalk/concurrent/api/PublisherCache.java

+     */
+    public Publisher<T> get(final K key, final Function<K, Publisher<T>> publisherSupplier) {
+        return Publisher.defer(() -> {
+            synchronized (publisherCache) {


synchronized has been used in the past bcz it doesn't require additional allocations. however loom fibers don't support synchronized, should we use Lock objects instead?

Scottmitch · 2024-03-09T02:06:09Z

servicetalk-concurrent-api/src/main/java/io/servicetalk/concurrent/api/PublisherCache.java

+                            }
+
+                            private void lockRemoveFromMap() {
+                                synchronized (publisherCache) {


add a comment here to the effect:

completion of the first Subscriber after multicast means the the multicast operator is in a terminal state and we therefore remove it from the map. There are cases where folks may want to re-subscribe to the Publisher (e.g. get the cached value, trigger another event) however that currently isn't supported and we favor bounding the size of the map which has scope outside the operator chain.

Scottmitch · 2024-03-09T02:11:27Z

servicetalk-concurrent-api/src/main/java/io/servicetalk/concurrent/api/PublisherCache.java

+     * @param <T> the type of the {@link Publisher} contained in the cache.
+     * @return a new PublisherCache that will wrap cached values with multicast operator.
+     */
+    public static <K, T> PublisherCache<K, T> multicast() {


by exposing these static method we may need to add a new factory method here for each operator overload. did you consider instead exposing just a Function<> (or similar) so folks can apply the variant they want? some risks maybe:

folks could apply operators that don't obey the assumptions (we could clarify the constraints the operator must abide by ... allows for multiple subscribers, cancels upstream only after no subscribers present, no async cancel processing as the synchronization here depends upon it, ..)

I did consider this. I'm open to removing these, the create() method just above is essentially what you describe, the user would specify the Publisher configured however they desire in the function on get.

I had initially built this without the static methods and a constructor that took a function which would be used to supply a new Publisher on a cache miss. I decided to move this function to the get method as it emulated how I might expect to use a cache, ex: I might not want a function from name -> Publisher but rather I would prefer a closure that would allow me to use the context at hand to instantiate the new object.

For an initial API I don't have any problem deferring to your suggestion as we learn how this ultimately ends up being used.

servicetalk-concurrent-api/src/main/java/io/servicetalk/concurrent/api/PublisherCache.java

mgodave added 9 commits March 5, 2024 09:51

simple PublisherCache

001bcb7

Update docs and add tests

df8a000

Move the loading function from the constructor to get

e6ce562

doc was accidentally on the wrong method

1649c62

make strategy private

b45a9d1

remove unused param tag

d6fc17d

update javadoc

813bac9

javadoc checkstyle

363310c

tests public -> package private

ea1b392

bryce-anderson reviewed Mar 5, 2024

View reviewed changes

Scottmitch reviewed Mar 9, 2024

View reviewed changes

servicetalk-concurrent-api/src/main/java/io/servicetalk/concurrent/api/PublisherCache.java Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce a cache for Publishers that tracks subscriptions to manage the cache #2861

Introduce a cache for Publishers that tracks subscriptions to manage the cache #2861

mgodave commented Mar 5, 2024

bryce-anderson Mar 5, 2024

bryce-anderson Mar 5, 2024

mgodave Mar 5, 2024

bryce-anderson Mar 5, 2024

mgodave Mar 5, 2024

Scottmitch Mar 9, 2024

mgodave Sep 27, 2024

Scottmitch Mar 9, 2024

Scottmitch Mar 9, 2024

Scottmitch Mar 9, 2024

Scottmitch Mar 9, 2024

Scottmitch Mar 9, 2024

mgodave Mar 11, 2024

Introduce a cache for Publishers that tracks subscriptions to manage the cache #2861

Are you sure you want to change the base?

Introduce a cache for Publishers that tracks subscriptions to manage the cache #2861

Conversation

mgodave commented Mar 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment